There were 18 raw data sets to import and append into one raw data file. Therefore, the below code from https://stackoverflow.com/questions/51321021/how-to-read-multiple-csv-files-in-a-directory-through-python-csv-function was adapted and re-used.
import glob #glob module provides a function for making file lists from directory wildcard searches
import os #Imports the OS interface module
import pandas as pd
folder_name = 'Crime Data' #specifies the folder in which the raw data sets are available
file_type = 'csv' #specifies the type of raw data files
seperator =',' #specifies the data seperator in the file
#The below for loop reads every file in the specified folder.
#Then the files are concatenated and assigned to a data frame called 'Crimes'
Crimes = pd.concat([pd.read_csv(f, sep=seperator) for f in glob.glob(folder_name + "/*."+file_type)],ignore_index=True)
Crimes.head()
| Crime ID | Month | Reported by | Falls within | Longitude | Latitude | Location | LSOA code | LSOA name | Crime type | Last outcome category | Context | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1bbc1435480452362391a8831eef45234893be2e5dc8fc... | 2019-06 | Metropolitan Police Service | Metropolitan Police Service | -0.662250 | 50.785967 | On or near Theme/Adventure Park | E01031431 | Arun 017A | Violence and sexual offences | Status update unavailable | NaN |
| 1 | bfc49d681f6f9e28418ab56d3d17de74c8929ffe0cb55a... | 2019-06 | Metropolitan Police Service | Metropolitan Police Service | -0.803131 | 51.829656 | On or near Bullfinch Gardens | E01017660 | Aylesbury Vale 011B | Other crime | Investigation complete; no suspect identified | NaN |
| 2 | c130c1b66528ec4d7de9f416ac29cb15c702a3e796c60f... | 2019-06 | Metropolitan Police Service | Metropolitan Police Service | 0.751540 | 52.031477 | On or near Sparrow Road | E01029887 | Babergh 008B | Violence and sexual offences | Status update unavailable | NaN |
| 3 | NaN | 2019-06 | Metropolitan Police Service | Metropolitan Police Service | 0.134947 | 51.588063 | On or near Mead Grove | E01000027 | Barking and Dagenham 001A | Anti-social behaviour | NaN | NaN |
| 4 | NaN | 2019-06 | Metropolitan Police Service | Metropolitan Police Service | 0.135866 | 51.587336 | On or near Gibbfield Close | E01000027 | Barking and Dagenham 001A | Anti-social behaviour | NaN | NaN |
Crimes.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 987370 entries, 0 to 987369 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Crime ID 700151 non-null object 1 Month 987370 non-null object 2 Reported by 987370 non-null object 3 Falls within 987370 non-null object 4 Longitude 976866 non-null float64 5 Latitude 976866 non-null float64 6 Location 987370 non-null object 7 LSOA code 976866 non-null object 8 LSOA name 976866 non-null object 9 Crime type 987370 non-null object 10 Last outcome category 700151 non-null object 11 Context 0 non-null float64 dtypes: float64(3), object(9) memory usage: 90.4+ MB
The fields 'Crime ID' and 'Context' are clearly not useful for this analysis. Hence, they are dropped using the below code.
Crimes2=Crimes.drop(['Crime ID', 'Context'], axis = 1)
Crimes2.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 987370 entries, 0 to 987369 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Month 987370 non-null object 1 Reported by 987370 non-null object 2 Falls within 987370 non-null object 3 Longitude 976866 non-null float64 4 Latitude 976866 non-null float64 5 Location 987370 non-null object 6 LSOA code 976866 non-null object 7 LSOA name 976866 non-null object 8 Crime type 987370 non-null object 9 Last outcome category 700151 non-null object dtypes: float64(2), object(8) memory usage: 75.3+ MB
#installing some packages that would be needed to carry out further analysis and visualisations
from matplotlib import pyplot as plt
import warnings
import seaborn as sns
%matplotlib inline
warnings.filterwarnings('ignore')
import numpy as np
It was required to re-arrange the data frames to carry out various computations and visualisations. Grouping by certain variables and aggregation were required to be done together. The below links were referred to when compiling the codes to perform these tasks. https://www.statology.org/pandas-groupby-aggregate-multiple-columns/ and https://jamesrledoux.com/code/group-by-aggregate-pandas
#Obtaining the crime counts by period and police area to get idea an initial about the data totals
Crimes3=Crimes2.groupby(['Month', 'Falls within']).agg({'Crime type': ['count']})
Crimes3.columns = ['Total Crimes Count']
Crimes3 = Crimes3.reset_index()
Crimes3.sort_values(by=['Falls within','Month'], inplace=True)
print(Crimes3)
Month Falls within Total Crimes Count 0 2019-06 Metropolitan Police Service 95942 2 2019-07 Metropolitan Police Service 99329 4 2019-08 Metropolitan Police Service 91534 6 2020-06 Metropolitan Police Service 100198 8 2020-07 Metropolitan Police Service 103657 10 2020-08 Metropolitan Police Service 104782 12 2021-06 Metropolitan Police Service 95527 14 2021-07 Metropolitan Police Service 94605 16 2021-08 Metropolitan Police Service 89240 1 2019-06 South Wales Police 12068 3 2019-07 South Wales Police 13063 5 2019-08 South Wales Police 12254 7 2020-06 South Wales Police 13357 9 2020-07 South Wales Police 13098 11 2020-08 South Wales Police 12812 13 2021-06 South Wales Police 12164 15 2021-07 South Wales Police 11991 17 2021-08 South Wales Police 11749
#To check if the aggregated field has been included in the dataframe properly
Crimes3.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 18 entries, 0 to 17 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Month 18 non-null object 1 Falls within 18 non-null object 2 Total Crimes Count 18 non-null int64 dtypes: int64(1), object(2) memory usage: 576.0+ bytes
#Checking if the data type was changed during aggregation
print (type(Crimes3))
<class 'pandas.core.frame.DataFrame'>
The re-arranged data frame was used to draw a line chart to visualise the trends in the crime totals over the periods for the two police areas. The weblink https://seaborn.pydata.org/tutorial/aesthetics.html was referred to when compiling the code.
import seaborn as sns
sns.set_theme(style = "darkgrid")
sns.set_context("paper") #Scaling plot elements.The default style is 'Notebook'.
#However, the graph produced did not show the x-axis values clearly.
#Hence, 'paper' style was used for a different scaling of plot elements.
sns.lineplot(x = "Month", y = "Total Crimes Count", hue = "Falls within", data = Crimes3)
<AxesSubplot:xlabel='Month', ylabel='Total Crimes Count'>
#Further data grouping and aggregations
Crimes4=Crimes2.groupby(['Month', 'Falls within', 'Crime type']).agg({'Crime type': ['count']})
Crimes4.columns = ['Total Crimes Count']
Crimes4 = Crimes4.reset_index()
Crimes4.sort_values(by=['Crime type','Falls within','Month'], inplace=True)
Crimes4
| Month | Falls within | Crime type | Total Crimes Count | |
|---|---|---|---|---|
| 0 | 2019-06 | Metropolitan Police Service | Anti-social behaviour | 22075 |
| 28 | 2019-07 | Metropolitan Police Service | Anti-social behaviour | 25718 |
| 56 | 2019-08 | Metropolitan Police Service | Anti-social behaviour | 23567 |
| 84 | 2020-06 | Metropolitan Police Service | Anti-social behaviour | 39584 |
| 112 | 2020-07 | Metropolitan Police Service | Anti-social behaviour | 35588 |
| ... | ... | ... | ... | ... |
| 139 | 2020-07 | South Wales Police | Violence and sexual offences | 3621 |
| 167 | 2020-08 | South Wales Police | Violence and sexual offences | 3873 |
| 195 | 2021-06 | South Wales Police | Violence and sexual offences | 4052 |
| 223 | 2021-07 | South Wales Police | Violence and sexual offences | 4083 |
| 251 | 2021-08 | South Wales Police | Violence and sexual offences | 4114 |
252 rows × 4 columns
#Viewing the summary statistics of the full data set. Here, top is the most common value
Crimes2.describe(include='all')
| Month | Reported by | Falls within | Longitude | Latitude | Location | LSOA code | LSOA name | Crime type | Last outcome category | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 987370 | 987370 | 987370 | 976866.000000 | 976866.000000 | 987370 | 976866 | 976866 | 987370 | 700151 |
| unique | 9 | 2 | 2 | NaN | NaN | 42705 | 7403 | 7403 | 14 | 14 |
| top | 2020-08 | Metropolitan Police Service | Metropolitan Police Service | NaN | NaN | On or near Supermarket | E01033595 | Westminster 013E | Anti-social behaviour | Investigation complete; no suspect identified |
| freq | 117594 | 874814 | 874814 | NaN | NaN | 31734 | 4703 | 4703 | 287219 | 325314 |
| mean | NaN | NaN | NaN | -0.497920 | 51.518152 | NaN | NaN | NaN | NaN | NaN |
| std | NaN | NaN | NaN | 1.076762 | 0.082902 | NaN | NaN | NaN | NaN | NaN |
| min | NaN | NaN | NaN | -5.539455 | 50.120985 | NaN | NaN | NaN | NaN | NaN |
| 25% | NaN | NaN | NaN | -0.268125 | 51.474047 | NaN | NaN | NaN | NaN | NaN |
| 50% | NaN | NaN | NaN | -0.128996 | 51.517206 | NaN | NaN | NaN | NaN | NaN |
| 75% | NaN | NaN | NaN | -0.045220 | 51.561705 | NaN | NaN | NaN | NaN | NaN |
| max | NaN | NaN | NaN | 1.741738 | 55.719400 | NaN | NaN | NaN | NaN | NaN |
#Extracting all the unique types of crime in the data sets
unique = set(Crimes2['Crime type'])
unique
{'Anti-social behaviour',
'Bicycle theft',
'Burglary',
'Criminal damage and arson',
'Drugs',
'Other crime',
'Other theft',
'Possession of weapons',
'Public order',
'Robbery',
'Shoplifting',
'Theft from the person',
'Vehicle crime',
'Violence and sexual offences'}
#Viewing the summary statistics of the data filtered for 'Metropolitan Police Service area only'
Crimes2_Metropolitan = Crimes2[Crimes2['Falls within']=='Metropolitan Police Service']
Crimes2_Metropolitan.describe(include='all')
| Month | Reported by | Falls within | Longitude | Latitude | Location | LSOA code | LSOA name | Crime type | Last outcome category | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 874814 | 874814 | 874814 | 866893.000000 | 866893.000000 | 874814 | 866893 | 866893 | 874814 | 615267 |
| unique | 9 | 1 | 1 | NaN | NaN | 34971 | 6478 | 6478 | 14 | 11 |
| top | 2020-08 | Metropolitan Police Service | Metropolitan Police Service | NaN | NaN | On or near Supermarket | E01033595 | Westminster 013E | Anti-social behaviour | Investigation complete; no suspect identified |
| freq | 104782 | 874814 | 874814 | NaN | NaN | 29097 | 4703 | 4703 | 259547 | 298173 |
| mean | NaN | NaN | NaN | -0.119715 | 51.511756 | NaN | NaN | NaN | NaN | NaN |
| std | NaN | NaN | NaN | 0.155676 | 0.078811 | NaN | NaN | NaN | NaN | NaN |
| min | NaN | NaN | NaN | -5.539455 | 50.120985 | NaN | NaN | NaN | NaN | NaN |
| 25% | NaN | NaN | NaN | -0.201725 | 51.470367 | NaN | NaN | NaN | NaN | NaN |
| 50% | NaN | NaN | NaN | -0.111868 | 51.515746 | NaN | NaN | NaN | NaN | NaN |
| 75% | NaN | NaN | NaN | -0.030012 | 51.554998 | NaN | NaN | NaN | NaN | NaN |
| max | NaN | NaN | NaN | 1.741738 | 55.719400 | NaN | NaN | NaN | NaN | NaN |
#Viewing the summary statistics of the data filtered for 'South Wales Police area only'
Crimes2_SouthWales = Crimes2[Crimes2['Falls within']=='South Wales Police']
Crimes2_SouthWales.describe(include='all')
| Month | Reported by | Falls within | Longitude | Latitude | Location | LSOA code | LSOA name | Crime type | Last outcome category | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 112556 | 112556 | 112556 | 109973.000000 | 109973.000000 | 112556 | 109973 | 109973 | 112556 | 84884 |
| unique | 9 | 1 | 1 | NaN | NaN | 9753 | 957 | 957 | 14 | 14 |
| top | 2020-06 | South Wales Police | South Wales Police | NaN | NaN | On or near Parking Area | W01001939 | Cardiff 032F | Violence and sexual offences | Unable to prosecute suspect |
| freq | 13357 | 112556 | 112556 | NaN | NaN | 3607 | 2419 | 2419 | 34262 | 37744 |
| mean | NaN | NaN | NaN | -3.479226 | 51.568572 | NaN | NaN | NaN | NaN | NaN |
| std | NaN | NaN | NaN | 0.303403 | 0.096037 | NaN | NaN | NaN | NaN | NaN |
| min | NaN | NaN | NaN | -4.366143 | 50.543140 | NaN | NaN | NaN | NaN | NaN |
| 25% | NaN | NaN | NaN | -3.786046 | 51.486166 | NaN | NaN | NaN | NaN | NaN |
| 50% | NaN | NaN | NaN | -3.384110 | 51.557267 | NaN | NaN | NaN | NaN | NaN |
| 75% | NaN | NaN | NaN | -3.198205 | 51.643326 | NaN | NaN | NaN | NaN | NaN |
| max | NaN | NaN | NaN | 0.211229 | 53.829399 | NaN | NaN | NaN | NaN | NaN |
#Splitting the original Month column which contains both Year and Month into separate Month and Year columns
Crimes2[['Year','Month of Crime']] = Crimes2.Month.str.split('-', expand = True)
Crimes2.sort_values(by=['Crime type','Falls within'], inplace=True)
Crimes2
| Month | Reported by | Falls within | Longitude | Latitude | Location | LSOA code | LSOA name | Crime type | Last outcome category | Year | Month of Crime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2019-06 | Metropolitan Police Service | Metropolitan Police Service | 0.134947 | 51.588063 | On or near Mead Grove | E01000027 | Barking and Dagenham 001A | Anti-social behaviour | NaN | 2019 | 06 |
| 1 | 2019-06 | Metropolitan Police Service | Metropolitan Police Service | 0.135866 | 51.587336 | On or near Gibbfield Close | E01000027 | Barking and Dagenham 001A | Anti-social behaviour | NaN | 2019 | 06 |
| 2 | 2019-06 | Metropolitan Police Service | Metropolitan Police Service | 0.133181 | 51.586719 | On or near Rosehatch Avenue | E01000027 | Barking and Dagenham 001A | Anti-social behaviour | NaN | 2019 | 06 |
| 3 | 2019-06 | Metropolitan Police Service | Metropolitan Police Service | 0.137065 | 51.583672 | On or near Police Station | E01000027 | Barking and Dagenham 001A | Anti-social behaviour | NaN | 2019 | 06 |
| 4 | 2019-06 | Metropolitan Police Service | Metropolitan Police Service | 0.134947 | 51.588063 | On or near Mead Grove | E01000027 | Barking and Dagenham 001A | Anti-social behaviour | NaN | 2019 | 06 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 987365 | 2021-08 | South Wales Police | South Wales Police | NaN | NaN | No Location | NaN | NaN | Violence and sexual offences | Awaiting court outcome | 2021 | 08 |
| 987366 | 2021-08 | South Wales Police | South Wales Police | NaN | NaN | No Location | NaN | NaN | Violence and sexual offences | Investigation complete; no suspect identified | 2021 | 08 |
| 987367 | 2021-08 | South Wales Police | South Wales Police | NaN | NaN | No Location | NaN | NaN | Violence and sexual offences | Under investigation | 2021 | 08 |
| 987368 | 2021-08 | South Wales Police | South Wales Police | NaN | NaN | No Location | NaN | NaN | Violence and sexual offences | Under investigation | 2021 | 08 |
| 987369 | 2021-08 | South Wales Police | South Wales Police | NaN | NaN | No Location | NaN | NaN | Violence and sexual offences | Awaiting court outcome | 2021 | 08 |
987370 rows × 12 columns
#Grouping by and aggregating to create a data frame to draw another line chart by Year of Summer and Police Area
Crimes5=Crimes2.groupby(['Year', 'Falls within']).agg({'Crime type': ['count']})
Crimes5.columns = ['Total Crimes Count']
Crimes5 = Crimes5.reset_index()
Crimes5.sort_values(by=['Falls within','Year'], inplace=True)
print(Crimes5)
Year Falls within Total Crimes Count 0 2019 Metropolitan Police Service 286805 2 2020 Metropolitan Police Service 308637 4 2021 Metropolitan Police Service 279372 1 2019 South Wales Police 37385 3 2020 South Wales Police 39267 5 2021 South Wales Police 35904
import seaborn as sns
sns.set_theme(style = "darkgrid")
sns.set_context("paper") #Scaling plot elements.Same reference as given in the previous code.
sns.lineplot(x = "Year", y = "Total Crimes Count", hue = "Falls within", data = Crimes5)
<AxesSubplot:xlabel='Year', ylabel='Total Crimes Count'>
#Grouping by and aggregating data to create a new data frame to draw a grouped bar chart
Crimes6=Crimes2.groupby(['Falls within','Year','Crime type']).agg({'Crime type': ['count']})
Crimes6.columns = ['Total Crimes Count']
Crimes6 = Crimes6.reset_index()
Crimes6.sort_values(by=['Falls within','Year','Crime type'], inplace=True)
print(Crimes6.head(17))
Falls within Year Crime type \
0 Metropolitan Police Service 2019 Anti-social behaviour
1 Metropolitan Police Service 2019 Bicycle theft
2 Metropolitan Police Service 2019 Burglary
3 Metropolitan Police Service 2019 Criminal damage and arson
4 Metropolitan Police Service 2019 Drugs
5 Metropolitan Police Service 2019 Other crime
6 Metropolitan Police Service 2019 Other theft
7 Metropolitan Police Service 2019 Possession of weapons
8 Metropolitan Police Service 2019 Public order
9 Metropolitan Police Service 2019 Robbery
10 Metropolitan Police Service 2019 Shoplifting
11 Metropolitan Police Service 2019 Theft from the person
12 Metropolitan Police Service 2019 Vehicle crime
13 Metropolitan Police Service 2019 Violence and sexual offences
14 Metropolitan Police Service 2020 Anti-social behaviour
15 Metropolitan Police Service 2020 Bicycle theft
16 Metropolitan Police Service 2020 Burglary
Total Crimes Count
0 71360
1 5730
2 18328
3 13070
4 12195
5 2494
6 29949
7 1555
8 13711
9 9348
10 10641
11 12565
12 26843
13 59016
14 111014
15 8400
16 14243
conda install -c plotly plotly=5.5.0
Collecting package metadata (current_repodata.json): ...working... done Solving environment: ...working... done # All requested packages already installed. Note: you may need to restart the kernel to use updated packages.
#imported the below to troubleshoot an error of charts drawn using plotly disappearing
import plotly.io as pio
pio.renderers.default='notebook'
Grouped bar charts were drawn below for each of the police areas by year grouped by crime type. Plotly.express was used as it produced better formatted charts than those drawn using matplotlib. Code reference: https://towardsdatascience.com/how-to-create-a-grouped-bar-chart-with-plotly-express-in-python-e2b64ed4abd7
#Filtering the data set for Metropolitan Police Service data only
Crimes6_Met = Crimes6[Crimes6['Falls within']=='Metropolitan Police Service']
#Darwing the grouped bar chart
import plotly.express as px
fig = px.bar(Crimes6_Met, x="Year", color="Crime type",
y='Total Crimes Count',
title="Count of Crimes by Crime Type in Metropolitan Police Area",
barmode='group',
height=600)
fig.show()
#Filtering the data set for South Wales Police data only
Crimes6_SW = Crimes6[Crimes6['Falls within']=='South Wales Police']
#Drawing the grouped bar chart
import plotly.express as px
fig = px.bar(Crimes6_SW, x="Year", color="Crime type",
y='Total Crimes Count',
title="Count of Crimes by Crime Type in South Wales Police Area",
barmode='group',
height=600)
fig.show()
Further to the previous analysis carried out, this section includes a comparison of the crime counts of the two police areas normalised for population. Here, a hypothesis is also tested.
#Grouping by and aggregating the data to prepare a data set for normalising crime counts for population
Crimes_norm = Crimes2.groupby(['Falls within','Year', 'Month of Crime', 'Crime type']).agg({'Crime type': ['count']})
Crimes_norm.columns = ['Total Crimes Count']
Crimes_norm = Crimes_norm.reset_index()
Crimes_norm.sort_values(by=['Falls within','Year', 'Month of Crime', 'Crime type'], inplace=True)
print(Crimes_norm.head())
Falls within Year Month of Crime \
0 Metropolitan Police Service 2019 06
1 Metropolitan Police Service 2019 06
2 Metropolitan Police Service 2019 06
3 Metropolitan Police Service 2019 06
4 Metropolitan Police Service 2019 06
Crime type Total Crimes Count
0 Anti-social behaviour 22075
1 Bicycle theft 1920
2 Burglary 6311
3 Criminal damage and arson 4596
4 Drugs 3736
Population of inner London exclusing City of London and outer London were taken for Metropolitan Police Service area. Population of Swansea, Neath Port Talbot, Rhondda Cynon Taff, Bridgend, Vale of Glamorgan, Cardiff, Newport, Monmouthshire, Torfaen, Blaenau Gwent, Caerphilly and Merthyr Tydfil were taken for South Wales. 2021 population was assumed to be the same as 2020 since the 2021 figures were unavailable.
#Importing population data
population = pd.read_csv('PopulationData.csv')
population.head()
| Area | Year | Population | |
|---|---|---|---|
| 0 | Metropolitan Police Service | 2019 | 8952268 |
| 1 | Metropolitan Police Service | 2020 | 8991550 |
| 2 | Metropolitan Police Service | 2021 | 8991550 |
| 3 | South Wales Police | 2019 | 1933601 |
| 4 | South Wales Police | 2020 | 1943476 |
#In order to merge the two data frames, one of the common fields has to be converted to integer type as the same field in the other data frame is interger
Crimes_norm['Year']=Crimes_norm['Year'].astype(int)
#Merging the data frames to include population data
Crimes_norm_merge = pd.merge(Crimes_norm, population, how='left', left_on=['Falls within','Year'], right_on = ['Area','Year'])
Crimes_norm_merge.head()
| Falls within | Year | Month of Crime | Crime type | Total Crimes Count | Area | Population | |
|---|---|---|---|---|---|---|---|
| 0 | Metropolitan Police Service | 2019 | 06 | Anti-social behaviour | 22075 | Metropolitan Police Service | 8952268 |
| 1 | Metropolitan Police Service | 2019 | 06 | Bicycle theft | 1920 | Metropolitan Police Service | 8952268 |
| 2 | Metropolitan Police Service | 2019 | 06 | Burglary | 6311 | Metropolitan Police Service | 8952268 |
| 3 | Metropolitan Police Service | 2019 | 06 | Criminal damage and arson | 4596 | Metropolitan Police Service | 8952268 |
| 4 | Metropolitan Police Service | 2019 | 06 | Drugs | 3736 | Metropolitan Police Service | 8952268 |
#Normalising the crime counts for population
Crimes_norm_merge['Normalised Crimes per 1000 population'] =Crimes_norm_merge['Total Crimes Count'] / Crimes_norm_merge['Population']*1000
Crimes_norm_merge
| Falls within | Year | Month of Crime | Crime type | Total Crimes Count | Area | Population | Normalised Crimes per 1000 population | |
|---|---|---|---|---|---|---|---|---|
| 0 | Metropolitan Police Service | 2019 | 06 | Anti-social behaviour | 22075 | Metropolitan Police Service | 8952268 | 2.465856 |
| 1 | Metropolitan Police Service | 2019 | 06 | Bicycle theft | 1920 | Metropolitan Police Service | 8952268 | 0.214471 |
| 2 | Metropolitan Police Service | 2019 | 06 | Burglary | 6311 | Metropolitan Police Service | 8952268 | 0.704961 |
| 3 | Metropolitan Police Service | 2019 | 06 | Criminal damage and arson | 4596 | Metropolitan Police Service | 8952268 | 0.513389 |
| 4 | Metropolitan Police Service | 2019 | 06 | Drugs | 3736 | Metropolitan Police Service | 8952268 | 0.417324 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 247 | South Wales Police | 2021 | 08 | Robbery | 48 | South Wales Police | 1943476 | 0.024698 |
| 248 | South Wales Police | 2021 | 08 | Shoplifting | 490 | South Wales Police | 1943476 | 0.252126 |
| 249 | South Wales Police | 2021 | 08 | Theft from the person | 88 | South Wales Police | 1943476 | 0.045280 |
| 250 | South Wales Police | 2021 | 08 | Vehicle crime | 466 | South Wales Police | 1943476 | 0.239777 |
| 251 | South Wales Police | 2021 | 08 | Violence and sexual offences | 4114 | South Wales Police | 1943476 | 2.116826 |
252 rows × 8 columns
#Aggregating the normalised data to create a data frame to draw a bar chart
Crimes_norm2 = Crimes_norm_merge.groupby(['Falls within','Year']).agg({'Normalised Crimes per 1000 population': ['sum']})
Crimes_norm2.columns = ['Crimes per 1000 population']
Crimes_norm2 = Crimes_norm2.reset_index()
Crimes_norm2.sort_values(by=['Falls within','Year'], inplace=True)
print(Crimes_norm2)
Falls within Year Crimes per 1000 population 0 Metropolitan Police Service 2019 32.037133 1 Metropolitan Police Service 2020 34.325228 2 Metropolitan Police Service 2021 31.070505 3 South Wales Police 2019 19.334392 4 South Wales Police 2020 20.204520 5 South Wales Police 2021 18.474115
#Drawing the bar chart to visualise normalised crime counts
fig = px.bar(Crimes_norm2, x="Year", color="Falls within",
y='Crimes per 1000 population',
title="Variation in Crimes per 1000 Population over Summer Months during 2019, 2020 and 2021",
barmode='group',
height=600)
fig.show()
The total crimes line chart by year for South Wales drawn before showed that summer months of 2019, 2020 and 2021 had similar crime counts. The below hypothesis was tested to check if the median monthly crime counts during summer months are equal in the South Wales Police area.
To test this hypothesis the non-parametric test Kruskal-Wallis was used.
A code from https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ was adapted and re-used to conduct the Kruskal Wallis test.
#Filtering, Grouping by and aggregating the data set to subset the sample data sets for hypothesis testing
#Filteing the data
SWCrimes = Crimes2[Crimes2['Falls within']=='South Wales Police']
#Grouping by and aggregating the data
SW = SWCrimes.groupby(['Year','Month of Crime']).agg({'Crime type': ['count']})
SW.columns = ['Crimes Count']
SW = SW.reset_index()
#filtering records of 2019 summer months
SW2019 = SW[SW['Year']=='2019']
#Set of 2019 summer months' crime counts
S_2019 = SW2019['Crimes Count']
#filtering records of 2020 summer months
SW2020 = SW[SW['Year']=='2020']
#Set of 2020 summer months' crime counts
S_2020 = SW2020['Crimes Count']
#filtering records of 2020 summer months
SW2021 = SW[SW['Year']=='2021']
#Set of 2020 summer months' crime counts
S_2021 = SW2021['Crimes Count']
##Kruskal-Wallis H Test
#Importing the package required to perform the statistical test
from scipy.stats import kruskal
#Defining the samples and the test
stat, p = kruskal(S_2019,S_2020,S_2021)
#Printing the test statistic and corresponding p-value of the test outcome
print('stat=%.3f, p=%.3f' % (stat, p))
#Printing the result
if p > 0.05:
print('p-value > alpha=0.05 and null hypothesis cannot be rejected. Therefore, median monthly no. of crimes in South Wales Police area during all 3 summers are equal')
else:
print('p-value < alpha=0.05. Therefore, reject Ho')
stat=5.689, p=0.058 p-value > alpha=0.05 and null hypothesis cannot be rejected. Therefore, median monthly no. of crimes in South Wales Police area during all 3 summers are equal
Although this is a dedicated section for the visualisations, many visualisations were done in reporting and analysing the data sets as well. In this section, pie charts would be the main focus.
#Merging the data sets to get the necessary fields for % calculations into one data set
CrimesMet_merge = pd.merge(Crimes6_Met, Crimes5, how='left', left_on=['Falls within','Year'], right_on = ['Falls within','Year'])
CrimesMet_merge.head()
| Falls within | Year | Crime type | Total Crimes Count_x | Total Crimes Count_y | |
|---|---|---|---|---|---|
| 0 | Metropolitan Police Service | 2019 | Anti-social behaviour | 71360 | 286805 |
| 1 | Metropolitan Police Service | 2019 | Bicycle theft | 5730 | 286805 |
| 2 | Metropolitan Police Service | 2019 | Burglary | 18328 | 286805 |
| 3 | Metropolitan Police Service | 2019 | Criminal damage and arson | 13070 | 286805 |
| 4 | Metropolitan Police Service | 2019 | Drugs | 12195 | 286805 |
#Calculating the % of crimes
CrimesMet_merge['% of Crimes from Total'] =CrimesMet_merge['Total Crimes Count_x'] / CrimesMet_merge['Total Crimes Count_y']*100
CrimesMet_merge.head()
| Falls within | Year | Crime type | Total Crimes Count_x | Total Crimes Count_y | % of Crimes from Total | |
|---|---|---|---|---|---|---|
| 0 | Metropolitan Police Service | 2019 | Anti-social behaviour | 71360 | 286805 | 24.881017 |
| 1 | Metropolitan Police Service | 2019 | Bicycle theft | 5730 | 286805 | 1.997873 |
| 2 | Metropolitan Police Service | 2019 | Burglary | 18328 | 286805 | 6.390405 |
| 3 | Metropolitan Police Service | 2019 | Criminal damage and arson | 13070 | 286805 | 4.557103 |
| 4 | Metropolitan Police Service | 2019 | Drugs | 12195 | 286805 | 4.252018 |
As per previous illustrations, out of both the police areas' periods Metropolitan Police Service area recorded the highest number of crimes during the summer of 2020. Below is a pie chart depicting the percentage of crimes by type from the total number of crimes that happened in the area during summer of 2020. The web link https://waynestalk.com/en/python-pie-donut-sunburst-charts-en/ was referred to in compiling the below code.
#Drawing the pie chart
import pandas as pd
import plotly.express as px
CrimesMet_merge_2020 = CrimesMet_merge[CrimesMet_merge['Year']== '2020']
fig = px.pie(CrimesMet_merge_2020, values='% of Crimes from Total', names='Crime type', title='% Breakdown of Crimes of Metropolitan Police Area in Summer 2020')
fig.show()
The pies in this pie chart drawn for 2020 can be compared with the pies of the below pie charts drawn for 2019 and 2021. the below subplot of pie charts was drawn adapting a code from https://plotly.com/python/pie-charts/
#Importing the required packages
import plotly.graph_objects as go
from plotly.subplots import make_subplots
#filtering the two data sets for the two years
CrimesMet_merge_2019 = CrimesMet_merge[CrimesMet_merge['Year']== '2019']
CrimesMet_merge_2021 = CrimesMet_merge[CrimesMet_merge['Year']== '2021']
#Defining the labels
labels = CrimesMet_merge_2019['Crime type']
# Creating subplots: using 'domain' type for Pie subplot
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=labels, values=CrimesMet_merge_2019['% of Crimes from Total'] , name="2019"),
1, 1)
fig.add_trace(go.Pie(labels=labels, values=CrimesMet_merge_2021['% of Crimes from Total'] , name="2021"),
1, 2)
#Using `hole` to create a donut-like pie chart so that the year could be displayed inside
fig.update_traces(hole=.4, hoverinfo="label+percent+name")
fig.update_layout(
title_text="% Breakdown of Crimes in 2019 and 2021 in Metropolitan Police Area",
# Add annotations in the center of the donut pies.
annotations=[dict(text='2019', x=0.18, y=0.5, font_size=20, showarrow=False),
dict(text='2021', x=0.82, y=0.5, font_size=20, showarrow=False)])
fig.show()
#https://cmdlinetips.com/2019/01/how-to-make-heatmap-with-seaborn-in-python/
mapdata = Crimes2.groupby(['Falls within','Month', 'Crime type']).agg({'Crime type': ['count']})
mapdata.columns = ['Crimes Count']
mapdata = mapdata.reset_index()
#print(Crimes_norm2)
import matplotlib.pyplot as plt
heatmap_data = pd.pivot_table(mapdata, values='Crimes Count',
index=['Falls within'],
columns='Month')
sns.heatmap(heatmap_data, cmap="YlGnBu")